DB course
Algebra
preliminaries: relation, attribute, schema, tuple, component, domain, current instance, key
operations
- set ops: union, intersect, minus
- filter ops: select
- relation ops: product, joins
- mapping ops: rename, project
equivalence
extended operators
deduplicate
aggregation
grouping
outer join (left,right, and theta)
sorting
datalog
Formalization
FD
key -> tuple and key is minimal, superkey is superset of key
rules
transitive
splitting
combining
trivial
argumentation: {As}->{Bs} => {As,Cs}->{Bs,Cs}
=>{As,Cs}->{Bs} (splitting) one case where rule 3 in ((6365fd79-49de-48d7-acdb-c5507b7d425d)) counts
closure
{superkey}+={all attributes}
check whether a FD follows from a given set
minimal basis
- 1.All the FD’s in B have singleton right sides.
- 2.If any FD is removed from B, B is no longer a basis.
- 3.If for any FD in B we remove one or more attributes from the left side of F, the result is no longer a basis (why?)
projecting
minimize
remove those follow from others
remove left side redundancy
BCNF
any FD As->Bs, As is superkey
decompose: not dependency preserving
chase test for lossless join
3NF
dependency preserving
any FD As->Bs, As is superkey or Bs are each in a key
decompose: find min-basis, a relation for each FD, add a key relation
MD
intuitively, As->->Bs holds if As determine a set of values of Bs.
trivial: As ->-> subset of As, As->->\As(others=)
FD promotion: As->Bs => As->->Bs(in tabular test, )
transitive
complementation: As->->Bs => As->->others
4NF
any MVD As->->Bs, As is superkey
chase test: initially two tuples t and u, use FDs and MVDs to produce v
projecting
simplification(not reviewed)
E-R Model
many->one(less than), many->one(exactly, rounded arrowhead), many—(degree constrains)many, roles
multiway relations -> relation entity&multi binary relations
isa(subclass relationship)
differs from object-oriented approach, entity may be in multi sub entity sets
root entity set
remove redundant entity set: referred once in relation, all attrs are in key
weak entity set (and its supporting relationship)
def: key is composed of attributes , some or all of which belong to another entity set
convert to relational design
combining many-one relationship to many side
weak entity set: don’t need supporting relation
classes: null method, OO method(enumerate all subtrees)
Hard Problems
设有关系 R(A,B,C,D,E,P),F={AB→C,C→D,E→A,B→C,CA→E,BD→A} 1、写出关系 R 的所有候选键,并指出关系 R 为第几范式。要求说明原因。 (4 分) 2、求出该函数依赖集的最小集 Fm。写出求解过程。 (7 分) 3、将 R 分解为具有无损连接性和依赖保持性的 3NF。写出分解过程。 (5 分)
Fm={B->C,C->D,E->A,AC->E,B->A}
SQL lang
components
- DDL: data definition lang, schema and constraint etc.
- DML: data manipulation lang, query and transaction etc.
- DCL: data control lang, permission and grant etc.
relations: stored, view, temp tables
schema
CREATE TABLE/VIEW <name> (...)inline constrains after attribute
or table-wise constrain after all attrs
DROP TABLE/VIEW <name> (...)ALTER TABLE/VIEW <name> (...)
query
sqlSELECT <> FROM <table>[alias],... WHERE{ CONDITIONS } GROUP BY () HAVING <condition>operators
not equal
<>pattern matching:
_,%,'','pattern' ESCAPE '<escape char>'NULL: calc result is NULL, comparison result is UNKNOWN
EXISTS,[NOT] IN,> ALL,> ANY(existence)CROSS JOIN,a JOIN b ON <c>,FULL|LEFT|RIGHT OUTERJOINset operators normally eliminates duplicate, use
a UNION ALL bto avoid
modification
SQL other features
Constraint
kinds
default value
key
- primary key not nullable, unique nullable
- can’t agree on all attrs, unless one is null
- two nulls don’t agree on each other
foreign key
policy: CASCADE, SET NULL, DEFAULT(reject)
not null
trivial
checkassertion
CREATE ASSERTION xxx CHECK (...)
naming
defer
[NOT] DEFERRABLE [INITIALLY DEFERRED/IMMEDIATE]SET CONSTRAINT xxx DEFERRED
scope
attribute: checked only when target attribute changes
tuple: checked more frequently
table(assertions)
Trigger
plaintextCREATE TRIGGER XXX AFTER|BEFORE|INSTEAD-OF UPDATE|DELETE|INSERT ON <table> REFERENCING xxx FOR EACH ROW|STATEMENT [WHEN <condition>] [BEGIN] statements; [END]
View
updatable view
using instead of trigger
materialize view
Index
estimate query cost (simple)
tuple locate
by index, 1r per index access
no index, full table scan
tuple access
if clustered, then may be 1r
else num of pages tuple is distributed
tuple insert
usually zero-cost to find a slot
1r per index update
Storage
disc access model
find track -> move to track -> wait for spinning -> start transferring
move to track: head start/stop + head move(n-tracks/velocity)
TODO spinning: half?avg? :LOGBOOK: CLOCK: [2022-11-02 Wed 08:31:22]—[2023-04-16 Sun 16:27:16] => 3967:55:54 :END:
transferring: size-block(sectors+gaps)/size-track * time per rotation
physical optimize
organize data by cylinders
mirroring disks
enhance reliability
speed up reading(different part per disk) not writing
scheduling
elevator algorithm
intelligent buffering
robustness
checksum
redundant storage(stable storage, RAID)
TODO RAID 4(one main disk, bottleneck) 5(symmetric backup) 6(multiple ) :LOGBOOK: CLOCK: [2022-11-02 Wed 09:25:01]—[2023-04-16 Sun 16:27:21] => 3967:02:20 :END:
storage format
schema
number of fields, type, order, name
fixed-length format
header
pointer to schema, length, timestamp
variable-length format
a pointer to other than first variable field
header->pointers->fixed-fields->var-fields
dedup in-record by another level pointers
separate variable fields
spanned records
pack into blocks
header
one or more other blocks
role of this block(normal data or overflow)
which relation the records belongs to
record offsets
real offset in-block
or forwarding address to another block
or tombstone indicating it’s deleted
timestamp
two-ends growing to center
variable header and record length
addressing
physical disk address
logical address
more compact
tradeoff: flexibility <-> cost
e.g. move records of indirection
logical->phisical
logical->memory
pointer swizzling
avoid repeatedly translating addr
strategies
automatic
on demand
unswizzling when returning to disk
need mem->logical indexing
pinned records and blocks
pinned: object is referred to by a swizzled pointer
use spare bytes(logical-memaddr) to make a linked list
modification
insertion
deletion
immediate reclaim space
mark delete
update
fixed-length easy
dynamic via deletion&insertion
Index
types
dense: one by one key-pointer pairs
sparse: adjacent entries determine a range corresponding to keys
multi-leveled
secondary
must be dense
indirection
reduce replicated key entries
key->bucket of many pointers->tuple
revert index(document retrieval)
data structures
B-trees
rules
- node min item
- leave node
- multiset case: min new key, blank key
insertion and deletion
- recursively split
- borrow from sibling node
hash table
extensible hash table
bucket directory
- initially first 1 bit
- once bucket used up, split the bucket, use k+1 bits, other buckets still remain unchanged
- once directory length not enough, double the directory, b0/b1 points to same bucket
linear hash
multi dimensional
partial match, range, nearest-neighbor
hash like approach
grid file
dynamic insertion: overflow block, or insert grid line
hard to find optimal split
partitioned hashing
good at partial match
tree like approach
kd-tree
- long search path
R-tree
bitmap approach
field -> collection of bitmaps, one bitmap per value
compressed bitmap
- run-length encoding
exercise
3.1 线性表删除时如何处理位数i减少
3.2 hard h(k)的每一位都相同无法区分,若重复数多于bucksize如何split
7.5 如何编码趋近
Exec
Physical Ops
one pass
two pass
sort
index
hash
algebra simplification
cost model
logical & physical plan
exercise
4.3.2
4.3.5
4.4.5、4.4.8 子表少于M如何利用空闲缓冲区
4.5.5 考虑磁盘IO特性,计算开销
Schedule
serializability
conflicts: ignore actions from same trans conflict equiv: by transforming non-conflict ops conflict serializable: conflict equiv to some serial schedule view serializable: ignore blind write
lock
shared-exclusive
upgrading lock
shared->upgrade: may lead to deadlock
update lock: may later write, singleton but can coexist with shared
increment lock
procedure
naive lock -> 2PL: conflict serializable, unlock after all locks
lock table
group mode
lock list: waiting and holding
grant policy
granularity
intention lock
组模式,某种锁优于另一种
tree protocol
观察到无需对父节点修改时unlock
deadlock
timestamp&validation
read/write too late
uncommited TX may abort -> delay until determined
exercise

r1a,w1a,r1b,w1b,r2b,w2b,r2a,w2a r2b,w2b,r2a,w2a,r1a,w1a,r1b,w1b 注意到其对A、B的操作有交换律,
Compare with 18.2.3
2.4
2.6 r3c,r1a,w1c,r2a,w2b,r3b
3.1?

case serial: 2 case interleave: 1A,2B->1B; 2B,1A->1A; 因此只有1A2B/1B2A两组内部可交替,共
4.1 会发生什么怎么说?
4.6
4.7 所有的?更新?如何考虑锁? a. r2c; b. i2c,i1b; c. u1b; d. w1b;
Transaction
failure recovery
undo
- write log->write disk->commit, commit data
redo
- write log->commit->write disk, commit log
- checkpoint: commit data
undo/redo *
checkpoint
- 避免向前追溯
start ckpt(Ti); end ckpt,若有end则Ti无需恢复start ckpt(Ti)中的Ti似乎可以省略?
logical log
logical equiv, compensating action
recoverability
(读)依赖的值必须先commit
日志落盘顺序需与产生顺序一致(group commit)
cascading rollback: dirty data, ACR调度, strict locking
rollback policy
block: cache refresh
smaller: using log
exercise
1.1 2PL和严格封锁的区别? strict只有在事务结束时才能统一释放所有w锁
1.4 “恢复时产生脏读”?
1.5 又是数调度!
2.6
6.2.2 如何计数 归纳递推,转换为向中插入O1
Course Project
杨元钊、钟祎诚、邹奕杨
架构: 整体为C/S架构,Server端采用MySQL数据库,Client使用Python+QtQuick开发。其中Server部分使用一台华为云ECS,根据MySQL官网指示部署了MySQL 8.0(最初使用docker部署,但性能低于常规值3个数量级,后期推测是由于ECS的虚拟化云硬盘IO延迟高)。Client部分为Qt前端+Python后端,主程序由Python调用PySide(Qt)运行QML引擎,引擎运行QML代码实现界面显示与交互,QML代码通过引擎调用后端类获取数据;后端类使用pymysql执行编写好的SQL语句与数据库交互。
分工: 杨元钊负责数据的爬取清洗及数据库Schema的设计、创建; 邹奕杨负责客户端后端类的编写、SQL语句的编写及部分语句优化; 钟祎诚负责服务端的部署、客户端整体和UI部分的编写、SQL优化。
